Skip to content

feat: merge-train/spartan#23253

Closed
AztecBot wants to merge 0 commit into
nextfrom
merge-train/spartan
Closed

feat: merge-train/spartan#23253
AztecBot wants to merge 0 commit into
nextfrom
merge-train/spartan

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 13, 2026

BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential pipeline (#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims (#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases (#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints' (#23259)
fix: clean up old jobs regardless of pending status (#23260)
refactor(p2p): remove unused sendBatchRequest (#23273)
chore(p2p): remove proposal_tx_collector leftovers (#23276)
feat: slash truncated checkpoint proposals (#23250)
refactor: remove unused map in attestation pool (#23284)
chore(p2p): assert last block in checkpoint proposal is correct (#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check (#23257)
feat(sandbox): support proposer pipelining in local network (#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under pipelining (#23302)
fix(archiver): atomic getter for L2 tips (#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under pipelining (#23296)
fix(world-state): make fork close idempotent for pruned forks (#23298)
test(e2e): migrate passing tests to proposer pipelining (#23275)
chore: update dashboard (#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local network" (#23313)
test: slash on bad attestation (#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (#23116)
test(e2e): enable pipelining on e2e debug trace (#23301)
test(e2e): enable pipelining on l1-to-l2 test (#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining (#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures (#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper (#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining (#23336)
END_COMMIT_OVERRIDE

Copy link
Copy Markdown
Collaborator

@ludamad ludamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot
Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

@AztecBot AztecBot added this pull request to the merge queue May 13, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 13, 2026
@alexghr alexghr enabled auto-merge May 14, 2026 08:41
@alexghr alexghr added this pull request to the merge queue May 14, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 14, 2026
@spalladino spalladino requested review from a team and charlielye as code owners May 15, 2026 01:13
@PhilWindle PhilWindle enabled auto-merge May 15, 2026 09:04
@PhilWindle PhilWindle added this pull request to the merge queue May 15, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 15, 2026
@alexghr alexghr added this pull request to the merge queue May 15, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 15, 2026
AztecBot added a commit that referenced this pull request May 15, 2026
… composes

Both docs/examples/ts/docker-compose.yml and playground/docker-compose.yml
ran with SEQ_ENABLE_PROPOSER_PIPELINING=true (added in #23277), but the
sandbox is not yet configured to absorb pipelining's side effects:

- example_swap stalls on `wait for proven block N` because the proven tip
  stops advancing in an idle pipelined sandbox (the original PR #23253
  dequeue, http://ci.aztec-labs.com/b08ac48286302949).
- aztecjs_advanced fails on
  `Cannot get L1 to L2 messages for checkpoint N: inbox tree in progress is N, messages not yet sealed`
  because under pipelining `AztecNodeService.simulatePublicCalls` reads
  L1->L2 messages from an in-progress checkpoint
  (http://ci.aztec-labs.com/419c4513023a1799). This is the same
  `simulator + inboxLag` mismatch already TODO'd in e2e_bot.test.ts and
  several e2e_fees tests.

Disable the flag in the two sandbox composes to unblock the spartan
merge train; aztec-up scripts (basic_install / bridge_and_claim /
amm_flow) keep the flag and continue exercising pipelining in CI.
@PhilWindle PhilWindle added this pull request to the merge queue May 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 16, 2026
@PhilWindle PhilWindle added this pull request to the merge queue May 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 16, 2026
@PhilWindle PhilWindle added this pull request to the merge queue May 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 16, 2026
spalladino pushed a commit that referenced this pull request May 16, 2026
…23333)

## Motivation

The merge-train/spartan train PR (#23253) was dequeued from the merge
queue this morning because grind run `x9` failed during `compile_all`:

```
==> Downloading sqlite3mc-2.2.4-sqlite-3.50.4-wasm.zip
curl: (6) Could not resolve host: release-assets.githubusercontent.com
```

CI logs:
- compile_all: http://ci.aztec-labs.com/dea5c9f3fde10614
- x9-full driver: http://ci.aztec-labs.com/1778928278029512
- merge-queue run:
https://github.com/AztecProtocol/aztec-packages/actions/runs/25959931160

The branch CI on the same commit (run 25958983932) passed — only one of
the 10 grind shards hit the DNS flake, but the merge-queue fail-fast
tore the whole run down. The other 9 grinds and the ARM run were still
pending when the queue dropped #23253.

## Approach

Add curl retry flags to `yarn-project/sqlite3mc-wasm/scripts/vendor.sh`
so a one-off `Could not resolve host` (or any other transient curl
failure) doesn't fail the build. `--retry 5 --retry-delay 2
--retry-all-errors --retry-connrefused` gives ~10s of total backoff,
which is plenty for a momentary DNS hiccup but bounded for genuine
outages.

This is the only curl in the yarn-project build path that hits GitHub
release assets, so this is a targeted fix rather than a sweep.

## Verification

`./bootstrap.sh ci` requires EC2 spawn and isn't runnable from inside
the container. Locally verified that `vendor.sh ensure` still downloads
and validates the pinned artifact correctly.

ClaudeBox log: https://claudebox.work/s/89dacb14037285cd?run=1
spalladino pushed a commit that referenced this pull request May 16, 2026
…er (#23334)

## Why

PR #23253 was dequeued from the merge queue when `merge-queue-heavy`'s
grind exercise hit a flake in `e2e_fees/fee_settings.test.ts`
(introduced by #23303, the head of `merge-train/spartan`). Failing
sub-test: `reproduces the stale fee snapshot race deterministically`. CI
log: http://ci.aztec-labs.com/cd390ea14cac1093

```
expect(received).toBeGreaterThan(expected)

Expected: > 1134386110000n
Received:   1067501300000n
  214 |       expect(bumpedMinFees.feePerL2Gas).toBeGreaterThan((lowerMinFees.feePerL2Gas * 11n) / 10n);
```

`bumpedMinFees` (`1067501300000`) was effectively the natural L2
baseline at that moment — no oracle rotation had occurred. The retry
inside `inflateL2FeesViaL1BaseFee` exited as soon as `after > before`
(with `before` captured at function entry), but the natural L2 fee
fluctuates between L1 blocks (EIP-1559 decay swings the L1 base-fee
sample), so a sub-percent upward drift satisfied the exit without the
oracle deadband (`LIFETIME - LAG = 3` L2 slots = 36 s) ever opening. The
test ran for only ~15 s before exiting, well short of the deadband.

The caller's `bumpedMinFees > lowerMinFees * 1.1` assertion then failed
because `lowerMinFees` was a separate snapshot taken earlier, and
natural drift between the two snapshots was below 10 %.

There is also a latent upper-bound issue: even on a successful rotation
the original `3x` L1 base-fee bump drives the L2 fee to ~2.0–2.5x once
EIP-1559 decay on the rotation-tx's block is applied, which would have
also failed `higherMinFees > bumpedMinFees` (where `higherMinFees =
lowerMinFees * 2n`).

## What

Three changes in
`yarn-project/end-to-end/src/e2e_fees/fee_settings.test.ts`:

- `inflateL2FeesViaL1BaseFee` takes a `reference: GasFees` parameter and
only returns when `after.feePerL2Gas >= reference * 13/10`. This
distinguishes a real oracle rotation (≥1.5x rise) from ambient noise
(≤±10%) and forces the loop to wait through the 36 s deadband.
- Retry budget grows from 60 s to 90 s to comfortably cover the deadband
plus a slot or two of margin.
- Test #2's synthetic `higherMinFees` grows from `lowerMinFees.mul(2)`
to `lowerMinFees.mul(4)`, giving unambiguous headroom over the realized
bumped fee while staying under the 6x default-padding cap so
`txWithDefaultPadding` is still the comparison point.

Test #1's bounds and semantics are unchanged; only the call site is
updated to pass `stableMinFees` as the reference.

## Test plan

- CI `merge-queue-heavy` (10 parallel grind runs of
e2e_fees/fee_settings)
- The PR-branch `ci-full-no-test-cache` already passed at the head
commit; the flake only surfaces under grind

Analysis:
https://gist.github.com/AztecBot/97861b48883eec686f5978a43a2082bb


ClaudeBox log: https://claudebox.work/s/89d3754c8b2b7140?run=1
@spalladino spalladino enabled auto-merge May 16, 2026 13:33
@spalladino spalladino added this pull request to the merge queue May 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 16, 2026
@spalladino spalladino added the claudebox Owned by claudebox. it can push to this PR. label May 16, 2026
spalladino pushed a commit that referenced this pull request May 16, 2026
…23336)

## Why

PR #23253 was dequeued (4th attempt) when `merge-queue-heavy` caught an
`e2e_amm.test.ts` setup tx getting dropped by a pipelining-driven chain
prune. CI log: `baec5a7453c20089`.

The wait-for-parent gate in
`CheckpointProposalJob.waitForValidParentCheckpointOnL1`
(`sequencer-client/src/sequencer/checkpoint_proposal_job.ts:398`)
**should** have blocked the discard, but it didn't — because a
`TestDateProvider` time warp from
`AnvilTestWatcher.syncDateProviderToL1IfBehind` landed **between** the
two `epochCache` reads in `Sequencer.work` (`sequencer.ts:217-218`) and
broke the pipelining invariant.

| step | wall-clock | `nowSeconds` | result |
|---|---|---|---|
| 1st `getEpochAndSlotInNextL1Slot` (`slot`) | ≈14:34:32.385 (pre-warp)
| `1778942079` | next L1 ts `1778942080` → **slot 18** |
| (warp at 14:34:32.390 sets offset 7611 → 7610) | | | |
| 2nd `getTargetEpochAndSlotInNextL1Slot` (`targetSlot`) | ≈14:34:32.395
(post-warp) | `1778942080` | next L1 ts `1778942084` → **slot 19** →
`+offset=1` → **targetSlot 20** |

Logged confirmation (gap = 2 instead of 1):

```
14:34:32.612  Preparing checkpoint proposal 19 for target slot 20 during wall-clock slot 18
              {nowSeconds=1778942079, slot=18, targetSlot=20, …}
```

With `slotNow = 18`, the gate at `checkpoint_proposal_job.ts:402` waits
on `waitForSyncedL2SlotNumber(slotNow)`. The archiver had already synced
past slot 18 — the wait returns immediately, far too early to see parent
ckpt 18 (which lands four seconds later at 14:34:36). The gate then sees
`checkpointedNumber=17, parentCheckpointNumber=18`, declares the parent
absent, and discards. Slot 20 expires uncheckpointed, archiver prunes
blocks 19/20, the inflight setup tx anchored to block 19 dies with
`Block header not found`.

Full timeline + log evidence:
https://gist.github.com/AztecBot/4863d10084dd20587bffcc43fd61dfee

## What

Scoped, test-only — per direction from Santiago. The previous "make
`checkpointed` the global PXE default" approach is reverted; only
`e2e_amm` is opted in:

```diff
-    } = await setup(4, { ...PIPELINING_SETUP_OPTS }));
+    } = await setup(4, { ...PIPELINING_SETUP_OPTS }, { syncChainTip: 'checkpointed' }));
```

The PXE option exists already (`yarn-project/pxe/src/config/index.ts`,
added in `75df5b5d44`). This is the same approach every other
pipelining-aware test uses (`e2e_p2p/*`, `e2e_epochs/*`,
`e2e_slashing/attested_invalid_proposal`). It anchors inflight txs to
the L1-confirmed tip so prunes on the proposed tip can't invalidate
them.

`PIPELINING_SETUP_OPTS` is left untouched — the pipelining migration of
`e2e_amm` in #23275 stays.

## Recommended follow-up (separate PR)

The real bug is the race in `Sequencer.work`. Worth fixing properly:

- **Snapshot the time once.** Add
`EpochCache.getCurrentAndTargetSlotInNextL1Slot()` that returns `{slot,
targetSlot, epoch, targetEpoch, ts, nowSeconds}` from a single
`dateProvider.nowInSeconds()` read; replace the two-call site in
`Sequencer.work`. Pipelining offset is a constant, so deriving
`targetSlot = slot + offset` from the same snapshot is trivial.
- **Defensive: wait on `targetSlot - 1`.**
`waitForValidParentCheckpointOnL1` should key off the parent's expected
build slot (`targetSlot - 1`) instead of `slotNow`, so the gate is
robust even if the invariant is broken upstream.

These aren't in this PR because they touch sequencer production code and
want their own review; the test-side workaround unblocks the merge-train
without changing the global PXE default.

## Test plan

The failure requires `merge-queue-heavy`'s 10-grind L1 contention to
surface reliably (single dev box can't reproduce). Change is a
single-arg addition; TS-trivial.

Analysis:
https://gist.github.com/AztecBot/4863d10084dd20587bffcc43fd61dfee

ClaudeBox log: https://claudebox.work/s/166e664eab264b04?run=3
AztecBot added a commit that referenced this pull request May 16, 2026
Both fail repeatedly on merge-train attempts under proposer pipelining
despite fix attempts (#23303, #23334 for fee_settings; #23336 for
e2e_amm). Skipping in .test_patterns.yml to land the train; to be
triaged and re-enabled (tracking issue assigned to spalladino).
AztecBot added a commit that referenced this pull request May 16, 2026
These four barretenberg C++ breaks arrived via next (git log
origin/next..HEAD shows 0 train commits touching them) and abort the
full CI build before the e2e suite runs, blocking merge-train #23253:

- common/fuzzer.hpp: add <cstring> for std::memcpy (bb-cpp-fuzzing)
- commitment_schemes_recursion/shplemini.test.cpp: include
  flavor/ultra_flavor.hpp for complete bb::UltraFlavor (bb-cpp-asan)
- smt_verification/util/smt_util.cpp: include
  stdlib_circuit_builders/ultra_circuit_builder.hpp for the full
  UltraCircuitBuilder_ template (bb-cpp-smt)
- api/api_chonk.cpp: clang-format-20 (bb-cpp-format-check)

Folded into the spartan train to unblock it per operator direction.
@ludamad ludamad closed this May 16, 2026
@ludamad ludamad force-pushed the merge-train/spartan branch from 4af2626 to db4ec58 Compare May 16, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-full-no-test-cache ci-no-squash claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants